A Simple and Efficient Estimation Method for Stream Expression Cardinalities
نویسندگان
چکیده
Estimating the cardinality (i.e. number of distinct elements) of an arbitrary set expression defined over multiple distributed streams is one of the most fundamental queries of interest. Earlier methods based on probabilistic sketches have focused mostly on the sketching algorithms. However, the estimators do not fully utilize the information in the sketches and thus are not statistically efficient. In this paper, we develop a novel statistical model and an efficient yet simple estimator for the cardinalities based on a continuous variant of the well known Flajolet-Martin sketches. Specifically, we show that, for two streams, our estimator has almost the same statistical efficiency as the Maximum Likelihood Estimator (MLE), which is known to be optimal in the sense of Cramer-Rao lower bounds under regular conditions. Moreover, as the number of streams gets larger, our estimator is still computationally simple, but the MLE becomes intractable due to the complexity of the likelihood. Let N be the cardinality of the union of all streams, and |S| be the cardinality of a set expression S to be estimated. For a given relative standard error δ, the memory requirement of our estimator is O(δ|S|N log log N), which is superior to state-of-theart algorithms, especially for large N and small |S| N where the estimation is most challenging.
منابع مشابه
A Signal Processing Approach to Estimate Underwater Network Cardinalities with Lower Complexity
An inspection of signal processing approach in order to estimate underwater network cardinalities is conducted in this research. A matter of key prominence for underwater network is its cardinality estimation as the number of active cardinalities varies several times due to numerous natural and artificial reasons due to harsh underwater circumstances. So, a proper estimation technique is mandat...
متن کاملEstimation of Cadmium and Uranium in a stream sediment from Eshtehard region in Iran using an Artificial Neural Network
Considering the importance of Cd and U as pollutants of the environment, this study aims to predict the concentrations of these elements in a stream sediment from the Eshtehard region in Iran by means of a developed artificial neural network (ANN) model. The forward selection (FS) method is used to select the input variables and develop hybrid models by ANN. From 45 input candidates, 13 and 14 ...
متن کاملA Simple Sample Preparation with HPLC-UV Method for Estimation of Clomipramine from Plasma
Clomipramine is a tricyclic antidepressant. Different methods for determination of clomipramine hydrochloride in plasma have been described. Most of these procedures favor the use of acidic back-extraction in extraction procedure and HPLC as the analytical technique. In this study, the clomipramine extraction procedure was modified and a direct injection to the column was performed to shorten t...
متن کاملA SIMPLE MODEL FOR THE ESTIMATION OF DIELECTRIC CONSTANTS OF BINARY SOLVENT MIXTURES
A simple and reliable method for quick estimation of the dielectric constant of a binary solvent mixture is proposed. The validity of the proposed method has been tested for a broad range of binary solvent mixtures
متن کاملA Simple Sample Preparation with HPLC-UV Method for Estimation of Clomipramine from Plasma
Clomipramine is a tricyclic antidepressant. Different methods for determination of clomipramine hydrochloride in plasma have been described. Most of these procedures favor the use of acidic back-extraction in extraction procedure and HPLC as the analytical technique. In this study, the clomipramine extraction procedure was modified and a direct injection to the column was performed to shorten t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007